Model Selection

Multimodal Speech Processing

# Multimodal Speech Processing

Ultravox V0 4 Llama 3 1 70b

Ultravox is a multimodal speech large language model, built upon the pre-trained Llama3.1-70B-Instruct and Whisper-medium backbones, capable of simultaneously receiving both speech and text as input.

Transformers Supports Multiple Languages

Llama 3 Typhoon V1.5 8b Audio Preview

Typhoon-Audio Preview is a Thai and English audio-language model capable of processing text and audio inputs, with text outputs.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase